This readme file contains instruction on reproducing analyses in Hao T, Elith J, Lahoz-Monfort J, Guillera-Arroita G. Testing whether ensemble modelling is advantageous for maximising predictive performance of species distribution models Ecography 

#####data download#####
Both the species and predictor data used in this study were compiled and used previously in Fithian, W. et al. 2015. Bias correction in species distribution models: pooling survey and collection data for multiple species. - Methods Ecol. Evol. 6: 424–438., and made openly available with that paper. To reproduce analyses in this study, first download the data from:
Fithian, William and Elith, Jane and Hastie, Trevor and Keith, David. (2014). Code and data supplement for "Bias Correction in Species Distribution Models: Pooling Survey and Collection Data for Multiple Species". Stanford Digital Repository. Available at: http://purl.stanford.edu/vt558xk1600

#####data instructions#####
After downloading data package from Fithian et al. (2015), load the file 'allData.RData' and save objects 'PA' and 'ibra' separately as 
PA.RData
ibra.RData
Then, migrate them to the 'Data' folder in this package:

You will also need to migrate the 'grids' folder from Fithian et al. (2015), but only for the following subfolders:
bc02
bc04
bc05
bc12
bc14
bc21
bc32
bc33
rjja
rsea
rugg

#####scripts#####
In the 'Scripts' folder of this package, you will find the scripts necessary for reproducing the analyses.

'load_var_stack.R' is for loading variable data - this is sourced by other scripts, no need to run it by itself

'MESS plot.R' is for reproducing the MESS plot explained in Appendix 2

Scripts 'checkerboard.R', 'latitudinal.R', 'checkerboard - thinned.R', 'latitudinal - thinned.R' are for building the models and obtaining evaluation statistics.
IMPORTANT: these take a long time to run and are very memory intensive, we highly recommend only running one script in one session and clearing memory cache before and after running scripts. In addition they also write a lot of local files (more than 50gb), so be prepared for hard disk usage.
Note that only evaluation results were saved at the completion of running biomod models (to reduce memory use) - if you want to examining the models themselves or predictions, you need to modify the code to do so.
However BRT models from dismo package are saved in the local environment - they are memory-intensive so be aware of potential memory issues

The remaining scripts ending with '...tables.R' are for reproducing result tables in both main text and Appendix 2 - the tables are printed out in a LaTex friendly format